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Abstract 

Background: Modern breeding and artificial selection play critical roles in pig domestication and shape the genetic 
variation of different breeds. China has many indigenous pig breeds with various characteristics in morphology and 
production performance that differ from those of foreign commercial pig breeds. However, the signatures of 
selection on genes implying for economic traits between Chinese indigenous and commercial pigs have been 
poorly understood. 

Results: We identified footprints of positive selection at the whole genome level, comprising 44,652 SNPs 
genotyped in six Chinese indigenous pig breeds, one developed breed and two commercial breeds. An empirical 
genome-wide distribution of Fst (F-statistics) was constructed based on estimations of Fst for each SNP across these 
nine breeds. We detected selection at the genome level using the High-Fst outlier method and found that 81 
candidate genes show high evidence of positive selection. Furthermore, the results of network analyses showed 
that the genes that displayed evidence of positive selection were mainly involved in the development of tissues 
and organs, and the immune response. In addition, we calculated the pairwise Fst between Chinese indigenous 
and commercial breeds (CHN VS EURO) and between Northern and Southern Chinese indigenous breeds (Northern 
VS Southern). The IGF1R and ESR1 genes showed evidence of positive selection in the CHN VS EURO and Northern 
VS Southern groups, respectively. 

Conclusions: In this study, we first identified the genomic regions that showed evidences of selection between 
Chinese indigenous and commercial pig breeds using the High-Fst outlier method. These regions were found to be 
involved in the development of tissues and organs, the immune response, growth and litter size. The results of this 
study provide new insights into understanding the genetic variation and domestication in pigs. 
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Background 

Pigs and humans have interacted for approximately 
10,000 years, and as a major protein source for 
humans, the pig is one of the most important domestic 
animals [1]. Domestic pigs originated from the 
Eurasian wild boar (Sus scrofa) approximately 9000 
years ago. European and Asian pigs were domesticated 
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independently and introgression of the Asian domestic 
pig into the European pig occurred after domestication 
[2,3]. Most of these breeds (especially commercial 
breeds) have been subjected to strong artificial selec- 
tion to improve pork productivity. However, different 
breeds show large differences in morphology and pro- 
duction performance due to various breeding objec- 
tives, selection systems and rearing environments; 
nevertheless, very little is known on the molecular 
mechanisms of artificial selection on pigs. 

The development of high-throughput sequencing and 
genotyping technologies makes it possible to investigate 
the selective pressures of various domestic animal spe- 
cies at the genomic level and to identify candidate genes 
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associated with economic traits in order to better under- 
stand the mechanisms of adaptive evolution. For ex- 
ample, several important genes relevant to reproduction 
and growth such as GHR and MC1R have been identi- 
fied in cattle [4-8], and Flori et al [6] implemented a 
network analysis for detected genes that have been puta- 
tively subjected to selection. Akey et al [9] identified 
155 regions in the canine genome that have likely been 
subjected to strong artificial selection, including the 
HAS2 gene, which is involved in skin wrinkling. The 
thyroid stimulating hormone receptor (TSHR) gene was 
identified as having undergone strong artificial selection 
in domestic chickens [10]. The above studies used sev- 
eral types of approaches that were based on either the 
allele frequency spectrum or the properties of haplotype 
segregation in populations to detect signals of recent 
positive selection on a genome-wide scale [8]. For ex- 
ample, Fst (a measure of population differentiation) pro- 
vides an estimate of the genetic variability between 
populations: a locus that shows significantly high Fst sta- 
tistics compared with other loci provides evidence for 
positive selection [11]. Akey et al [12] suggested that 
the loci in the tails of the empirical distribution of Fst be 
used as candidate targets of selection. Another method 
of identifying loci under selection is the EHH (Extended 
Haplotype Homozygosity) test [13], which identifies the 
genome regions that have unusually high LD and allele 
frequency. 

The advent of the Illumina Porcine SNP60 BeadChip 
[14] allows for the investigation of selective pressure at 
the genome-wide level in pigs. Melanocortin receptor 1 
gene (MC1R) was identified as an artificial selection 
gene related to coat colour in Chinese domestic pigs 
[15]. A missense mutation in the PPARD gene had an 
effect on the ear size of the pigs [16]. China has a num- 
ber of indigenous pig breeds, most of which are fat-type 
and low degree of nurturing breeds. Therefore, using 
Chinese indigenous breeds would be a better way to ob- 
tain meaningful signatures of selection on genes imply- 
ing for economic traits in the pig at genomic level. 
Therefore, the objective of this study was to identify 
regions subjected to recent artificial selection using a 
genome scan for SNP differences. The findings will con- 
tribute to the construction of a positive selection map, 
which could help us to understand the recent breeding 
history of different pig breeds. Our results will also fa- 
cilitate the identification of candidate genes that are im- 
portant for economic traits for breeding practices. 

Results 

Population structure and genome-wide distribution of Fst 

To examine the genetic structure of the studied popula- 
tions, the principle component analysis (PCA) was 
conducted based on all available SNP information. As 



shown in Figure 1, the first two components accounted 
for 42.43% and 8.94% of the variation, respectively. The 
Luchuan, Bama and Wuzhishan pigs were clustered 
closely, as were the Ningxiang and Tongcheng pigs and 
the Large White and Landrace pigs, while the Yutai and 
Laiwu were more distant from the other pig breeds. 

We constructed the empirical genome-wide distribu- 
tion of global Fst estimates based on 44,652 SNPs of the 
nine breeds (ALLPOP) in order to examine the inter- 
locus variation in allele frequencies (Figure 2). The aver- 
age Fst of these loci was 0.3717 with standard deviation 
0.16. Local environmental adaptation and artificial selec- 
tion can change the allele frequencies of specific loci: the 
frequency of advantageous alleles at the selected loci will 
increase, leading to a higher than expected level of 
population differentiation (Fst) [12]. The genome-wide 
distribution of Fst revealed selection in the pig genome. 
To identify specific genomic regions containing signa- 
tures of selection, we constructed a chromosomal 
distribution of Fst as a function of chromosome pos- 
ition. As shown in Figure 3, the sex chromosomes have 
a smaller effective population size compared to the 
autosomes, which makes them more sensitive to demo- 
graphic events and/or natural selection [12]. As a re- 
sult, there was an unexpectedly high Fst level on the 
physical position 40-80 M of the X chromosome. Tak- 
ing into account the PCA analysis results, the pairwise 
Fst between Chinese indigenous and European com- 
mercial breeds were calculated by merging Chinese in- 
digenous breeds and commercial breeds into two 
groups (CHN VS EURO). In addition, the pairwise Fst 
between Northern (LW pigs) and Southern Chinese in- 
digenous breeds was calculated by merging LC, WZS 
and BM into one group (Northern VS Southern). 

Candidate genes under selection 

To identify loci subjected to selection, we focused on the 
high-Fst outlier method corresponding to the distribu- 
tion of Fst. According to the empirical distribution of Fst 
estimates, we selected the high-Fst outlier SNPs that 
corresponded to the upper 1% of the distribution as the 
loci under selection. In the ALLPOP group, a total of 
446 SNPs were determined to be subjected to natural or 
artificial selection following this criteria, and these SNPs 
were from a total of 81 candidate genes (Additional 
file 1: Table SI). In addition, a total of 84 and 79 
candidate genes were identified in the CHN VS EURO 
group and the Northern VS Southern group, respectively 
(Additional file 2: Table S2 and Additional file 3: Table 
S3). Several candidate genes contain contiguous outlier- 
Fst SNPs; for example, the transient receptor potential 
three (TRPM3) gene contains five contiguous SNPs with 
Fst values that are consistently high, and the nuclear 
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Figure 1 Principal component analysis results based on whole genome SNP data. (235 individuals, 44,652 SNPs). Individuals are plotted 
according to their coordinates on the biplot of PC1 versus PC2. Breed abbreviations are described in Materials and Methods. 



envelope spectrin repeat 2 (Nesprin-2) gene contained 
two outlier-Fst SNPs in the ALLPOP group. 

Functional analysis of candidate genes under selection 

Based on a system biology approach, we carried out net- 
work analysis using IPA software to identify the critical 
physiological pathways of the genes harbouring foot- 
prints of positive selection. The pig breeds selected have 
obvious differences in both morphology and perform- 
ance. The Large White and Landrace pigs are well- 
known commercial breeds with high meat productivity, 
fast growth, and high adaptability; however, Chinese in- 
digenous breeds vary in morphological and performance 



phenotypes and in local environmental suitability. For 
example, the Bama and Wuzhishan pigs from Southern 
China have a small body size, while the Laiwu pigs from 
Northern China are larger. First, 75 out of 81 genes in 
the ALLPOP group were mapped to the IPA database, 
and then three significance networks, namely Nl, N2 
and N3, were constructed. N2 and N3 were intercon- 
nected and further merged into a single network (N). 
Networks N and Nl are represented in Figures 4 and 5, 
respectively. The main hubs of the N network contained 
genes encoding protein kinases (Akt, Erk, Mapk, JAK2, 
PKC), transcription factors (NFkB, FOS), and several 
other signalling molecules (Insulin, CDKN1B, NR3C1, 
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Figure 2 Genome-wide distribution of Fst in ALLPOP group. 



Vegf). The N network contained 32 candidate genes 
under selection (CD274, DHRS9, DIAPH2, EROIL, 
GLP2R, GNAQ, HDAC8, HS6ST2, IGFBP7, IL17RD, 
JAK2, KLF13, MOV10, OTX2, PAX6, PPKCQ, PTPLA 
Dl, SMG5, SORBS2, TFAP2A, ZBTB10, BBS9, COR02A, 
DACH2, GFI1B, MMP16, POU3F4, PRIM1, RECQL, SER- 
PINA7, TNMD, TRIM14), and the Nl network contained 
29 candidate genes (AFF2, CEP78, COMMD8, EXT2, 
FAM184B, GCFC1, INO80D, KCNH5, KHDRBS3, LRP2BP, 



MED 12, MLANA, MY015A, NDST2, NDUFS1, NPAS2, 
OTC, PELI2, PHKA1, PSMB7, RGS22, SLC16A1, SLC 
01A2, SYNE2, TIPRL, TRPM2, UBR2, UNC13C, WLS). 
These molecules were mainly involved in morphology, 
cellular function and maintenance, the cell cycle and signal- 
ing. The main hubs of the network contained IGF1R, JAK2 
and calmodulin in the CHN VS EURO group (Additional 
file 4: Figure SI) and ESR1,PKC and insulin in Northern VS 
Southern group (Additional file 5: Figure S2). 
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Discussion 

In this study, the population structure of the nine pig 
breeds was analyzed, and the PCA results showed that 
most of the individuals could be classified into their 
breeds using the first and second eigenvectors (Figure 1). 
As with other livestock species such as cattle and sheep 
[17,18], the combination of PCI and PC2 separated indi- 
viduals according to their geographic origin: of all the 
studied breeds, the indigenous breeds of Southern China 
(Wuzhishan, Bama, Luchuan) clustered together, as did 
the breeds of Central China (Nixiang, Tongcheng), the 
Northern Chinese breed (Laiwu) and a developed breed 
(Yutai) formed a separate single cluster, and two com- 
mercial breeds, the Large White and Landrace, formed a 
distinct cluster. There was almost no overlap between 
the nine different pig breeds. This opens the possibility 
that an informative SNP panel can be used to assign par- 
entage, which has proven successful in cattle [19]. 

Pigs have been undergoing selection to enhance per- 
formance and productivity during domestication and 



breed formation. In the present study, global and pair- 
wise Fst was utilized to detect genetic selection in 
Chinese indigenous and commercial pig breeds. First, 
the ALLPOP group showed evidence of selection on 
chromosomes 8 (Figure 3). We identified selection near 
KIT, which can affect coat colour in pigs when mutated 
[20] and also shows high evidence of selection in sheep 
[18]. In addition, as shown in Figures 4 and 5, the N net- 
work contained several hubs involved with physiological 
signaling molecules (NFkB, MAPK, ERK). These data in- 
dicate that these genes participate in the basic physio- 
logical processes, and the N network contained hubs 
(TNF and beta-estradiol) showing that the genes under 
selection are involved in the immune response and re- 
productive traits. The POU3F4 and OTX2 genes are im- 
portant for the development of cochlea, and mutants of 
these two genes in mice cause developmental defects in 
the inner ear [21,22], In mouse embryonic stem cells, 
the mutant zinc-finger proto-oncogene GFI1B gene de- 
creases erythropoiesis of embryonic stem cells [23]. The 



Yang et a I. BMC Genetics 2014, 15:7 
http://www.biomedcentral.com/1471-21 56/1 5/7 



Page 6 of 9 



MFHAS1 |L1/|L6/TNF 
SLC01A2 T 

WLS 



Ftl2c4 NMRAL1 




! 

; ' ' Complex 

'< Cytokine/Growth Factor 

| 1 1 Che mica I Toxicant 

; T L Enzyme 

! Group/Complex/Otier 

! M Ion Channel 

| Kkiase 

\ <Q Peptidase 

; i ' Transcription Regulator 

! Transporter 

; 1 1 Unknown 

: ~ 



LRP2BP 



Figure 5 



ICK 

Representation of the gene network N1. Symbols corresponding to genes under selection are 



colored in grey. 



PAX6 gene is necessary and sufficient to trigger the cas- 
cade of events required for eye formation [24]. The 
PAX6 and OTX2 genes also play important role in the 
development of the body axis [25,26]. In addition, 
several identified molecules are involved in the develop- 
ment of organs. The GNAQ gene can regulate cardiac 
growth and development, and mice lacking both 
GNAQ and GNA11 [Gaq(-/-); Gall(-/-)] died at em- 
bryonic day 11 due to cardiomyocyte hypoplasia [27]. 
The TFAP2A gene is a critical transcription factor for 
epidermal differentiation and interacts with notch sig- 
naling molecules [28]. 

Several candidate genes are also involved in molecular 
transportation in the ALLPOP group. For example, the 
SLOC1A2 and SERPINA7 genes can increase the trans- 
port of thyroid hormone in the serum [29,30]. The 
SLC16A1 gene plays an important role in the transport 
of mevalonate and ketone bodies [31,32]. Among the 
candidate genes under selection, some are associated 
with genetic disorders and cancer in humans. GWAS re- 
sults showed that a SNP substitution mutation of BBS9 
was associated with amyotrophic lateral sclerosis [33]; 



furthermore, BANF2, SNX25, SAMD12 and GPR177 
were associated with Crohn's disease [34], and inflamma- 
tory bowel disease was associated with the upregulation 
of human CD274 at the cell surface from macrophage- 
derived dendritic cells of the inflamed colon [35]. In 
addition, three genes (DIAPH2, AFF2, POF1B) were in- 
volved in functions related to premature ovarian failure 
[36,37]. In the CHN VS EURO group, the network hubs 
are gathered at the centre with the IGF1R (Insulin-like 
growth factor 1 receptor) gene (Additional file 4: Figure 
SI), which is necessary for normal growth. IGF1R null 
mice die at birth of respiratory failure and exhibit only 
45% of the body weight of their wild-type littermates 
[38]. European commercial pig breeds grow faster in 
contrast with Chinese indigenous breeds. In addition, 
the IGF1R gene also showed a strong signature of selec- 
tion in European domestic pigs [39]. Interestingly, one 
of the most critical signalling molecules, JAK2, showed 
high evidence of positive selection both in the ALLPOP 
and the CHN VS EURO groups (Figure 4 and Additional 
file 4: Figure SI). JAK2 is an essential gene in mammals 
and participates in a variety of biological processes; the 
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loss of JAK2 is lethal [40]. JAK2 is also involved in the 
immune response [41], and it has been suggested that 
these pig breeds may have different resistance to patho- 
gens. In the Northern VS Southern group, the central 
hub of the network was the ESR1 (esotrogen receptor 1) 
gene (Additional file 5: Figure S2). The ESR1 gene was 
associated with litter size in pigs and was also a candi- 
date gene for boar fertility and sperm quality [42-44]. 
Laiwu pigs have a higher reproductive capacity compared 
with the Bama and Wuzhishan pigs. Correspondingly, the 
ESR1 gene showed high evidence of positive selection in 
our study. 

The high-Fst outlier method is a powerful tool for the 
detection of positive selection [45]; however, the high 
correlation between Fst estimates when loci are in strong 
disequilibrium makes it difficult to determine whether 
the Fst at particular SNP is markedly different from the 
expected values [46]. We also tested the correlation of 
Fst between pairs of SNPs as a function of marker 
distances; the correlation of Fst tended to drop quickly 
toward 0 when SNPs were more than 300 kb apart (data 
not shown). Modern pig breeds had much larger average 
linkage disequilibrium (LD) than humans and cattle [47], 
therefore, the results in pigs were greater than in humans 
and bovines [6,12]. 

Conclusions 

Overall, a genome-wide scan was performed in Chinese 
indigenous pigs to help interpret artificial selection and 
adaptive evolution. We constructed population struc- 
tures and genome-wide distributions of Fst. A number 
of genes were identified as displaying signatures of selec- 
tion, and several critical physiological pathways of these 
genes were determined to have footprints of positive 
selection. Some of these genes play important roles in 
biological processes, which can be used to interpret the 
differences between these pig breeds. 

Methods 

DNA samples and SNP chip data quality control 

DNA samples were obtained from 235 pigs from nine 
breeds from different areas of China, including six 
indigenous breeds: Tongcheng (TC, n = 35), Bama (BM, 
n = 22), Laiwu (LW, n = 23), Wuzhishan (WZS, n = 25), 
Ningxiang (NX, n = 24), Luchuan(LC, n = 40), two 
commercial breeds, Landrace (n = 18) and Large White 
(n = 26), and one developed breed, Yutai (YT, n = 22). 

Genotyping was carried out using the Illumina Porcine 
SNP60 BeadChip [14], which contains a total of 62,123 
SNPs. Quality control was determined using the PLINK 
programme [48]. A total of 8,383 unmapped markers 
(Based on Sus Scrofa Build 9.0) and 8,391 loci were 
filtered to exclude markers with a minor allele frequency 
(MAF) < 0.05. A total of 2,709 markers that were 



genotyped on less than 90% of all individuals were dis- 
carded from further analysis. The final data set consisted 
of 44,652 SNPs from nine breeds. 



Population structure and Fst estimation 

Principal component analysis (PCA) based on all available 
SNP information was performed using the SVS7 software 
(Golden Helix Inc., Bozeman, MT,USA). Fst statistics across 
populations were estimated using the Genepop 4.1 program 
[49]. Fst is a measure of population differentiation, which is 

HpfinpH as Fst — MSP-MSI 
aennea as rsi - MS p + ( nc -i)MSi+n c MSG> 

where MSG, MSI and MSP represent the mean sums 
of squares for gametes, individuals and populations 
computed by an analysis of variance, respectively, and 
n c = (Si - S 2 /Si)/(n - 1), where Sj is the total sample size, 
S 2 is the sum of squared group sizes, and n is the num- 
ber of non-empty groups. 



Identification of candidate genes under selection 

Genome regions containing the high-Fst outliers corre- 
sponding to the distribution of Fst were identified as 
follows: for all loci, a region was considered to be a 
high-Fst outlier if it corresponded to the upper 1% of 
the empirical genome-wide distribution of Fst. A gene 
was regarded as being under selection if it contained 
unexpectedly highly differentiated SNPs among the pop- 
ulations. All of the high-Fst outlier loci were mapped 
to gene-associated regions based on the pig genome 
annotation (Sus Scrofa Build 9.0 version). An SNP was 
considered to be from a particular gene if it mapped 
to either the 5' upstream, 5' UTR, coding, intronic, 
3'UTR, or 3' downstream region of the gene. 



Network analysis of candidate genes 

Network analysis was aimed at searching for the 
direct or indirect interactions between candidate 
molecules and the related property. The known 
interactions were annotated by experts according to 
the literature. Ingenuity Pathway Analysis (IPA) v7.0 
(Ingenuity Systems Inc., USA, http://www.ingenuity. 
com/) was used to construct networks. We uploaded 
the genes being subjected to selection into this soft- 
ware and organized them into networks of interacting 
genes to identify several pathways containing import- 
ant functionally related genes. This network analysis 
approach was similar to the described by Flori [6]. The 
genes that displayed evidence of selection were 
uploaded into IPA based on the eligible candidate 
genes, and IPA automatically constructed several net- 
works that contained a limit of 70 molecules (including 
candidate genes). 
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Additional file 1: Table SI. Candidate genes under selection with SNPs 
in high Fst in group ALLPOP. 

Additional file 2: Table S2. Candidate genes under selection with SNPs 
in high Fst (CHN VS EURO). 

Additional file 3: Table S3. Candidate genes under selection with SNPs 
in high Fst (Northern VS Southern). 

Additional file 4: Figure SI. Representation of the gene network 
group CHN VS EURO. Symbols corresponding to genes under selection 
are colored in grey. 

Additional file 5: Figure S2. Representation of the gene network 
group Northern VS Southern. Symbols corresponding to genes under 
selection are colored in grey. 
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